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Increased signaling entropy in cancer requires 
the scale-free property of protein interaction 
networks 


Andrew E. Teschendorff, Chris R. S. Banerji, Simone Severini, Reimer Kuhn and 
Peter Sollich 


Abstract One of the key characteristics of cancer cells is an increased phenotypic 
plasticity, driven by underlying genetic and epigenetic perturbations. However, at 
a systems-level it is unclear how these perturbations give rise to the observed in¬ 
creased plasticity. Elucidating such systems-level principles is key for an improved 
understanding of cancer. Recently, it has been shown that signaling entropy, an over¬ 
all measure of signaling pathway promiscuity, and computable from integrating a 
sample’s gene expression profile with a protein interaction network, correlates with 
phenotypic plasticity and is increased in cancer compared to normal tissue. Here 
we develop a computational framework for studying the effects of network pertur¬ 
bations on signaling entropy. We demonstrate that the increased signaling entropy 
of cancer is driven by two factors: (i) the scale-free (or near scale-free) topology 
of the interaction network, and (ii) a subtle positive correlation between differential 
gene expression and node connectivity. Indeed, we show that if protein interaction 
networks were random graphs, described by Poisson degree distributions, that can¬ 
cer would generally not exhibit an increased signaling entropy. In summary, this 
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work exposes a deep connection between cancer, signaling entropy and interaction 
network topology. 

Keywords: entropy; cancer; scale free; network; perturbation; genomics; 
signaling; intra-tumour heterogeneity 


Introduction 

One of the key features of cancer is an increased cellular plasticity, mediated by an 
increased promiscuity in signaling patterns, and driven by underlying genetic and 
epigenetic aberrations which cause a fundamental rewiring of the intracellular sig¬ 
naling network 0 1^3 El [3T1 SI H21 • Every aberration found in a cancer cell 

can be thought of as a perturbation if the aberration affects the gene functionally. 
Such perturbations can be classed as activating, if they result in an increased func¬ 
tional activity of the gene (e.g. amplification and overexpression of ERBB2 in breast 
cancer), or inactivating, if it compromises gene function (e.g. silencing through pro¬ 
moter DNA methylation). Whilst the effect of certain specific perturbations on gene 
function can be predicted, it is much less clear how individual perturbations affect 
the cellular phenotype as a whole, since this depends on the collective nature of 
the other aberrations that are present in the same cell. Predicting the net effect of 
multiple perturbations in a signaling network is hard due to complex effects such 
as pathway redundancy and epistasis [9, fTTll . Moreover, in the context of cancer, 
although the effect of specific aberrations on cell function is known, it is yet unclear 
how individual cancer perturbations may contribute to the observed increased sig¬ 
naling promiscuity and phenotypic plasticity. 

One way to approach this challenge computationally, is to anchor the analysis on 
global measures which capture salient features of the cellular phenotype, and which 
are computable from, say, a sample’s molecular profile (e.g. a sample’s gene expres¬ 
sion profile). Here we are particularly interested in measuring signaling promiscuity 
since evidence is mounting that this underlies a sample’s phenotypic plasticity l26l . 
In previous work we have started to explore a measure which approximates intra¬ 
sample signaling promiscuity, and which is known as network signaling entropy 
GHEE). Signalling entropy is computed from integrating a sample’s genome¬ 
wide gene expression profile with a protein interaction network and, as shown by 
us, provides a surprisingly good estimate of a sample’s height in Waddingtons’s 
differentiation landscape, with human embryonic stem cells (hESCs) exhibiting the 
highest levels of entropy 0 . Indeed, signalling entropy was able to discriminate 
cellular samples according to their differentiation potential within distinct lineages, 
including hematopoietic, mesenchymal and neural lineages, and with terminally dif¬ 
ferentiated cells within these lineages exhibiting the lowest levels of entropy 0. 
Importantly, signaling entropy was also found to be higher in cancer compared to 
normal tissue, consistent with the view that cancer cells represent a more undiffer¬ 
entiated stem-cell like state, characterised by an increase in phenotypic plasticity 
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Given that increased signaling entropy is such a robust and characteristic feature 
of differentiation potency and cancer, and that it is also amenable to computa¬ 
tion (391 ED, it is of great theoretical and biological interest to study the changes 
in entropy caused by cellular network perturbations. In the context of cancer, two 
well-known network perturbations are the overexpression and underexpression of 
oncogenes and tumour suppressor genes, respectively, and although these pertur¬ 
bations are known to result in the uncontrolled activation of cell-growth and cell- 
proliferation pathways, it remains unclear how these perturbations affect signalling 
promiscuity. In order to deepen our understanding, we here decided to study the 
effect of such perturbations on signaling entropy, using both simulated and real 
data, and using a variety of different network types in order to assess the impact of 
network topology. Specifically, we consider Erdos-Renyi random (Poisson) graphs 
ED, scale free networks as well as real protein-protein interaction (PPI) net¬ 
works [128,711301. In doing so, we discover that in Poisson networks, perturbations 
(be they activating or inactivating) lead to reductions in the global entropy, but that 
this is not true for scale-free and more realistic PPI networks. In networks exhibiting 
a scale-free, or near scale-free topology, we show that gene expression perturbations 
affecting hubs exhibit a striking bi-modality, leading to increases or decreases in the 
global entropy rate depending on the directionality of the expression change. We fur¬ 
ther expose a subtle yet significantly positive correlation between differential gene 
expression in cancer and node-degree, which we show drives the increased signal¬ 
ing promiscuity of cancer, but only if the underlying protein interaction network has 
a scale-free (or near scale-free) topology. Thus, this work makes a deep connection 
between a defining feature of the cancer phenotype, i.e. high signaling entropy, its 
differential gene expression pattern and the (near) scale-free topology of real PPI 
networks. 

Although there are many studies on network perturbations, it is worth clarifying that 
the network perturbations and outcome of interest (i.e. the entropy rate) considered 
in this work are very different from the perturbations and outcomes of interest con¬ 
sidered in previous studies EESlE2IEilE3. Specifically, we consider network 
perturbations which only alter the local edge weights without altering the under¬ 
lying network topology 0312313 . Moreover, our network perturbations can be 
both activating as well as inactivating, representing the two different types of cancer 
alterations affecting oncogenes and tumour suppressors, respectively. In contrast, 
much of the previous literature has dealt with the effects of removing specific nodes 
in unweighted networks mm, a type of inactivating perturbation which alters 
the underlying network topology, focusing on tolerance and robustness as outcome 
measures mmmm . Thus, from a network theoretical perspective, the important 
novel insights reported in this work are made possible by considering a novel type 
of network perturbation in the context of weighted networks defined by a stochastic 
matrix. We should also stress that our outcome of interest, signaling entropy, is a 
systems-level measure that is constructed from the genome-wide expression profile 
of a given sample, and therefore has little to do with the protein signaling disorder 
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measures considered by other studies and which do not use gene expression data 

G5) 


Results 

Increased signaling entropy in cancer is driven by overexpression 
of hub genes 


In earlier work we demonstrated that signaling entropy, a measure of the signal¬ 
ing promiscuity in a cellular sample, is increased in cancer compared to normal 
tissue, irrespective of tissue type E3EHSI. This increased signaling entropy is 
consistent with the observed increased phenotypic plasticity of cancer cells (see e.g. 
1261 ). Thus, increased signaling entropy has emerged as a cancer systems hallmark 
l39ll34ll . Signaling entropy is estimated as the entropy rate ifm of a sample-specific 
stochastic matrix which models the signaling interactions in the sample (Appendix). 
This stochastic matrix is computed by integrating the gene expression profile of the 
sample with a comprehensive PPI network, invoking the mass-action principle to 
define the edge-weights in the network (Appendix). The mass-action principle is 
based on the assumption that two proteins, which have been reported to interact, are 
more likely to interact in a given sample if both are highly expressed in that sample. 
Here we wanted to shed light on why, theoretically, we observe increased signal¬ 
ing entropy in cancer. We decided to use liver cancer as a model since liver rep¬ 
resents a relatively homogeneous tissue, and is thus less affected by contaminating 
non-epithelial cells. We downloaded gene-normalised RNA-Seq data for a matched 
subset of 50 normal liver and 50 liver cancer samples from The Cancer Genome 
Atlas (TCGA). Confirming our earlier work using Affymetrix gene expression data 
mmm, liver cancer exhibited a significantly higher signaling entropy rate com¬ 
pared to normal liver tissue (Fig.lA). 

Randomisation of the RNA-Seq profiles over the nodes in the network resulted in 
a significantly reduced difference in entropy rate between normal and cancer tissue 
(Fig.lA), indicating (as pointed out by us previously f34l ) that the entropy increase 
in cancer is driven by a subtle interplay between specific gene expression changes 
and where these happen on the network. Specifically, we posited that the topological 
properties of the genes undergoing the largest changes in gene expression would be 
key features dictating the change in signalling entropy. 

Since each gene i contributes an amount K l LS l to the entropy rate of a given sample 
(Appendix), we computed for each gene the difference in the means of its local en¬ 
tropy rate, 7T/L5), between normal and cancer tissue. In order to help interpretation, 
we also computed for each gene the difference in the means of the invariant mea¬ 
sure 7ti between normal and cancer, as well as the difference in the average local 
entropy LSi (Appendix). All these changes were assessed in relation to the con¬ 
nectivity of the genes in the network. We observed that the entropy rate increase 
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in cancer is driven mainly by hubs, i.e. the nodes of highest degree in the network 
(Fig.lB). Changes to the local entropy rates were driven by concomitant changes in 
the average invariant measure (Fig.lC). Thus, hubs exhibited preferential increases 
in their average invariant measure, whilst also demonstrating positive increases in 
the average local entropy (Fig.lD). Since the invariant measure value at a node i 
represents the steady-state probability of finding a random walker at this node, the 
observed preferential increase in the invariant measure at hubs means that there is 
an increased signaling flux through these hub nodes in cancer. 

To gain insight as to why there is an increased signaling flux through hubs in 
cancer, we focused on the hub gene exhibiting the largest increase in the local en¬ 
tropy rate. This was the gene BUB1 (Fig.2A). A scatterplot of the expression values 
of BUB1 and that of its neighbors (813 neighbors) in a representative normal sam¬ 
ple versus the corresponding expression values in a representative cancer sample, 
demonstrates that most of the expression differences involve increases in gene ex¬ 
pression, implicating both the hub itself as well as some of its neighbors (Fig.2B). 
Thus, for the majority of neighbors of BUB1 , the increased expression of BUB1 
will, according to the mass action principle, drive increased signaling through this 
hub. Indeed, for each one of BUB1 ’s neighbors we ranked its neighbors according to 
the largest increase in gene expression, revealing that the original hub (i.e. BUB1) 
ranked among the top 2% centile for 99% of the hub neighbors (SI Appendix, 
fig.Sl). Interestingly, this effect was not unique to BUB1 since high-degree hubs 
generally exhibited a significant skew towards increased gene expression in cancer 
(Fig.2C-D). 

Confirming the biological significance of these results, we reached very similar con¬ 
clusions by repeating the above analysis in the independent Affymetrix gene expres¬ 
sion data set of normal liver and liver cancer tissue l40l (SI Appendix, fig.S2-S3). 
Thus, the increased entropy rate in liver cancer is driven mainly by the increased 
expression of the highest degree hubs in the PPI network. 


Effect of cancer perturbations on signaling entropy 

That the highest degree genes show preferential expression increases in cancer 
(Fig.2C-D, SI Appendix fig.S3) suggests an intricate link between network topol¬ 
ogy and differential expression. Confirming this further, in both liver expression sets 
we also observed that the genes exhibiting the largest, or most significant, decreases 
in expression preferentially mapped to low-degree nodes (Fig.2C-D, SI Appendix 
fig.S3-S4). 

This intricate correlation between differential expression and node degree mo¬ 
tivated us to pursue a deeper understanding of the complex interplay between net¬ 
work topology, gene expression perturbations and entropy rate. Intuitively, and from 
the perspective of a gene i that interacts with an oncogenic hub, overexpression of 
the latter would lead to an increased outgoing signaling flux of node i towards the 
hub, potentially leading to an increase in the overall entropy rate (Fig.3A). Inter- 
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estingly, underexpression of a low-degree node, which may connect to a hub either 
directly or indirectly through an intermediate node i would also lead to an increased 
signaling flux through the hub (Fig.3A). Thus, the two characteristic topological 
features of differential gene expression changes in cancer could synergize causing 
increased signaling flux through key hubs. To test whether this is indeed the case, we 
performed a perturbation analysis for the top 100 genes ranked according to fold- 
change between normal liver and liver cancer. The initial signaling distribution was 
defined by invoking the mass action principle on the average expression profile over 
all 50 normal liver samples. Next, each of the top 100 ranked genes was individually 
perturbed by changing its expression level according to the observed difference be¬ 
tween normal and cancer tissue. Confirming our hypothesis, underexpressed genes 
(which generally did not target hubs) led to marginal increases in the entropy rate, 
whilst overexpressed hubs caused significant entropy increases (Fig.3B). Interest¬ 
ingly however, overexpression led to marginal entropy decreases whenever it did 
not target the highest degree hubs, suggesting that such perturbations draw away 
signaling flux from the major hubs (Fig.3B). 


The effect of perturbations on signaling entropy is dependent on 
network topology 

To further investigate the effect of individual perturbations on signaling entropy, as 
well as the role of the underlying network topology, we devised a simulation frame¬ 
work on toy networks, perturbing each node in turn, and recording the effect on the 
entropy rate (Fig.4A, Appendix). To simplify the analysis we considered an initial 
uniform edge weight configuration, defining an unbiased random walk on the graph. 
We note that this initial configuration represents a state of relatively high signaling 
entropy, but not of maximal entropy (see Appendix). As activating perturbations we 
consider local increases in gene expression, whereby all the weights of edges con¬ 
verging on a perturbed node i are assigned a relatively large weight (Fig.4B). Thus, 
as seen from the perspective of a neighboring node j, before perturbation, node j 
has maximal local entropy, given by logkj (where kj is the degree of node j), whilst 
after the perturbation, the node’s local entropy is close to 0 (Fig.4B). We empha¬ 
size again that although in the initial configuration all local entropies are maximal, 
that the initial entropy rate over the whole network is not maximal (see Appendix). 
Thus, after the perturbation, the global entropy rate of the network could increase or 
decrease. 

In order to understand the potential impact of network topology, we first con¬ 
ducted the perturbation analysis above on Erdos-Renyi (ER) random graphs, for 
which the degree distribution is Poisson. For such ER graphs, we observed that acti¬ 
vating perturbations (i.e. increases in gene expression), always led to a reduction in 
the global entropy rate, irrespective of node degree (Fig.4C). Repeating the analysis 
for inactivating perturbations, i.e causing nodes to undergo underexpression, we ob¬ 
served that almost all nodes led to a decrease in entropy. Thus, given that cancer is 
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characterised by an increase in signaling entropy, this suggests that the emergence 
of an increased signaling promiscuity regime in cancer must be due either to specific 
topological features not present in random graphs, or to non-random combinations 
of perturbations. 

To investigate this further, we next performed the same perturbation analysis above, 
but now on networks characterised by a scale-free (or near scale-free) topology, a 
key feature of real biological networks o The scale-free networks were matched 
to the same size and average connectivity than the previously considered Erdos- 
Renyi graphs. Remarkably, in scale-free networks we observed that activating per¬ 
turbations exhibited a bi-modal response, with perturbations at lower-degree nodes 
resulting in a reduction of the global entropy rate, whilst hubs exhibited increases 
(Fig.4C). In fact, we observed two distinct regimes with an opposite functional re¬ 
lationship between entropy change and node-degree (Fig.4C). In the low-degree 
regime, the entropy rate decreased as node degree increases, whereas in the high- 
degree regime one observes entropy increases (Fig.4C). Interestingly, this bi-phasic 
behaviour was not seen for inactivating perturbations where we observed a mono¬ 
tonic decrease of entropy with node degree (Fig.4C). In stark contrast to Poisson 
networks, high-degree nodes in the scale-free network exhibited a bi-modal re¬ 
sponse dependent on the directionality of the perturbation (Fig.4C): overexpressed 
hubs led to entropy increases, while underexpressed hubs led to corresponding de¬ 
creases. 

Next, we wanted to test whether this bi-phasic and bi-modal behaviour is also seen 
in real PPI networks. We first checked that our PPI network exhibited an approxi¬ 
mate scale-free topology (SI Appendix, fig.S5). Its clustering coefficient was also 
significantly higher than that of a degree-distribution matched scale-free network 
(SI Appendix, fig.S5). Performing the perturbation analysis on the PPI network, 
we observed once again two phases, which was particularly striking for activating 
perturbations, with one phase exhibiting a negative correlation between node degree 
and entropy, whilst the hub regime exhibited a positive correlation (Fig.4C). Very in¬ 
terestingly, however, increases in entropy were only observed for the highest-degree 
hubs, with lower-degree hubs exhibiting decreases which were surprisingly also of 
a larger magnitude (Fig.4C). Thus, in networks with a scale-free or an approximate 
scale-free topology, overexpression of the highest degree hubs leads to an increase 
in the entropy rate. But increasing signaling flux through lower-degree nodes, even 
if of relatively high degree, leads to an overall reduction in the diffusion rate. 

From the combined perturbation analysis, we can thus see that individual perturba¬ 
tions on an Erdos-Renyi graph, be they activations or inactivations (but both causing 
a local reduction in entropy), invariably lead to a reduction in the global entropy 
rate. This is in stark contrast to networks with a scale-free or approximate scale- 
free topology, where we observe that gene activations can have opposite effects on 
entropy rate depending on the degree of the activating nodes. 
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Entropy rate increase in cancer requires a scale-free interaction 
network topology 


The previous perturbation analysis strongly supports the view that a scale-free, or 
near scale-free network topology, is important for the observed increased entropy 
rate in cancer. To test this formally, we recomputed the entropy rate of all 50 normal 
liver and 50 liver cancer samples, but now using an underlying Erdos-Renyi (ER) 
interaction network matched to the same size and average connectivity of the full 
PPI network. In order to faithfully preserve the correlation between gene expression 
and node degree of the PPI network, nodes of the ER network were ranked accord¬ 
ing to degree and gene expression values assigned according to their corresponding 
rank/centile in the original PPI network. Thus, this node mapping between the two 
networks preserves the observed rank correlation between differential expression 
and node-degree, allowing us to objectively assess the importance of the scale-free 
property. Recomputation of the entropy rates of all 100 samples on the ER network 
revealed no significant difference between normal and cancer, thus demonstrating 
that the observed entropy rate increase in cancer requires the scale-free property of 
the interaction network (Fig.5A). Supporting this further, we observed, in two other 
matched normal-cancer RNA-seq expression sets from the TCGA, that the entropy 
was no longer higher in cancer when the PPI network was replaced with an equiv¬ 
alent ER graph (Fig.5B-C). In independent Affymetrix gene expression data, we 
observed that the cancer-associated increase in the entropy rate was reduced upon 
computing entropy on an equivalent ER network, in three out of four studies (SI 
Appendix, fig.S6). Thus, in 6/7 data sets, there was a reduction in the entropy rate 
difference between cancer and normal tissue (Binomial, P=0.008), supporting the 
view that a scale-free interaction topology is indeed necessary for the higher en¬ 
tropy signaling dynamics of cancer. 


Discussion 


Signaling entropy, a measure of the overall uncertainty or promiscuity in signaling 
patterns within a cellular sample, has been shown to be of biological significance 
in a variety of different contexts 0213 EH- In cellular differentiation it provides 
a proxy to the energy potential (i.e. height) of Waddington’s epigenetic landscape, 
allowing the differentiation potential of a sample to be assessed purely from its 
genome-wide transcriptomic profile a Similarly, signaling entropy also provided 
us with a useful framework in which to identify specific systems-level features char¬ 
acterising cancer, one of which being the increased signaling promiscuity of cancer 
compared to its corresponding normal tissue B2I3E3I. This is important because 
an increased signaling promiscuity could underlie the increased phenotypic plastic¬ 
ity of cancer, as observed e.g. by Pisco et al (26). 

In this work we aimed to obtain a deeper theoretical understanding as to (i) why sig- 
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naling entropy is increased in cancer and (ii) why it is such a robust discriminatory 
feature. We have here demonstrated that the increase in signaling entropy is driven 
by two factors. First, a subtle positive correlation between differential gene expres¬ 
sion and the degree of the corresponding proteins in the PPI network. This corre¬ 
lation amounts to hubs exhibiting preferential increases in gene expression, whilst 
those genes exhibiting the most significant underexpression map preferentially to 
low-degree nodes. Second, the observed increase of entropy in cancer requires the 
scale-free (or near scale-free) topology characterising PPI networks. Indeed, by con¬ 
sidering a Poisson network with an identical rank correlation coefficient between 
differential expression and node-degree, we no longer consistently observed a sig¬ 
nificant increased entropy rate in cancer (Fig.5). Given the demonstrated biological 
significance of the entropy rate guso, this last result thus exposes a deep connec¬ 
tion between the cancer phenotype and the underlying scale-free property of real PPI 
networks. It suggests that if the degree distribution of a PPI network were Poisson, 
that the transcriptomic changes seen in cancer would not define a highly promiscu¬ 
ous signaling regime. In other words, our data support the view that cancer “hijacks” 
the scale-free property of real signaling networks in order to facilitate increased sig¬ 
naling promiscuity and intra-tumour heterogeneity. 

The novel insights described above also explain why the entropy rate provides such 
a robust discriminatory feature of the cancer phenotype. The robustness stems from 
the subtle correlation between differential expression and node-degree. Although 
gene expression data is notoriously noisy, there is generally speaking good agree¬ 
ment across independent studies when comparing the changes in differential gene 
expression between two marked phenotypes such as normal and cancer tissue ll29l . 
Secondly, although current PPI networks only represent mere caricatures of the real 
interactions in a cell, the “hubness” of a protein is likely to be a very robust feature. 
Indeed, that a given protein has exceptionally many interactions, thus defining a hub 
in a network, is likely to be a very robust feature, despite the fact that the specific 
interaction space of the hub may contain many false negatives and false positives 
□ Thus, the relative robustness of differential expression and hubness drives the 
robustness of the observed correlation between differential expression and node de¬ 
gree, which in turn explains why increased signaling entropy is such a consistent 
feature of the cancer phenotype 03 0. Given the robustness of signaling entropy 
as a marker of differentiation potency 0, it is therefore tempting to speculate that 
a subtle correlation between differential expression and node degree also exists in 
the context of normal cellular differentiation. Furthermore, it will be interesting to 
explore if the scale-free or near scale-free topology of PPI networks is also a key 
element underlying the nature of pluripotency, multipotency and terminal differen¬ 
tiation. 

Although many previous studies have explored differential gene expression changes 
in cancer and other diseases in relation to network topology GUEZIEIEIHIIE] 
USHMlEliD, most of these have either focused on global topological properties, or 
on finding differential gene modules, or on studying absolute changes in differential 
expression. Indeed, a number of studies agree in reporting that absolute differential 
expression correlates negatively with node degree, meaning that hubs exhibit, on the 
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whole, much smaller changes in expression between disease phenotypes l25ll22l . 
Interestingly, however, relatively little attention has been paid to studying the direc¬ 
tionality of differential gene expression in cancer in relation to node degree. Here we 
have shown that there exists a subtle yet significantly positive correlation between 
differential expression and protein-degree. On its own, the biological significance 
of this correlation is unclear. However, by interpreting this correlation in the novel 
contextual framework of signalling entropy, we have here shown how, in the context 
of real (near) scale-free networks, it could underpin the increased phenotypic plas¬ 
ticity of cancer. 

In summary, increased expression of oncogenic hubs, as well as reduced expres¬ 
sion of network-peripheral tumour suppressor genes, in interaction networks char¬ 
acterised by a (near) scale-free topology, drives the high signaling entropy of cancer 
and could thus underpin cancer’s phenotypic robustness and plasticity. Further in- 
depth study of the complex interplay between local protein activity changes, their 
interaction network topology and the effect on signaling entropy is warranted. 


Appendix 

The protein protein interaction (PPI) network 

We used a PPI network similar to that used in our previous publication ll38j . 
Briefly, the human interaction network derives from the Pathway Commons Re¬ 
source (www.pathwavcommons. ore) f71. which brings together protein interactions 
from several distinct sources, including the Human Protein Reference Database 
(HPRD) 128 ], the National Cancer Institute Nature Pathway Interaction Database 
(NCI-PID) (pid.nci.nih.gov ), the Interactome (Intact) http://www.ebi.ac.uk/intact/ 
and the Molecular Interaction Database (MINT) http://mint.bio.uniroma2.it/mint/. 
Protein interactions in this network include physical stable interactions such as 
those defining protein complexes, as well as transient interactions such as post- 
translational modifications and enzymatic reactions found in signal transduction 
pathways, including 20 highly curated immune and cancer signaling pathways from 
NetPath (i www.netpath.org ) [ 201. The network focuses on non-redundant interac¬ 
tions, only included nodes with an Entrez gene ID annotation and on the maximally 
connected component thereof, resulting in a connected network of 8,434 nodes 
(unique Entrez IDs) and 303,600 documented interactions. 


Normal and cancer tissue gene expression data sets 

We focused on liver cancer because the associated normal tissue constitutes a rel¬ 
atively homogeneous mass of cells, and thus the entropy rate is less likely to be 
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influenced by changes in tissue-type composition. We downloaded the level 3 gene 
normalized RNA-Seq data from the TCGA ( www.cancergenome.nih ) for a matched 
subset of 50 normal liver and 50 liver cancer samples. As validation, we considered 
an Affymetrix expression data set, consisting of 37 normal livers (including nor¬ 
mal liver, cirrhosis and dysplasia) + 38 liver cancers So). To test generalisability, 
we also downloaded level 3 RNA-Seq gene normalised data from the TCGA for 
prostate cancer (52 cancers & 52 matched normals) and colon cancer (27 cancers 
& 27 matched normals). The other normal/cancer Affymetrix expression sets used 
have been described previously 0 . 


Construction of the sample specific stochastic matrix and entropy 
rate 


The construction of the entropy rate follows the same method described in our ear¬ 
lier work |3j|34). Briefly, we use the mass action principle to define a stochastic 
matrix, pij , for each individual sample. In detail, let £) denote the normalised ex¬ 
pression level of gene i in a given sample. For a given neighbour j G N(i) (where 
N(i) labels the neighbours of i in the PPI), the mass-action principle means that the 
probability of interaction with j is approximated by the product EiEj , i.e. pij oc E\Ep 
Normalising this to ensure that pij = 1, we get for the stochastic matrix, 


Pij = 


Ei 

E*etf(0 E k 


Vy G N(i) 


( 1 ) 


Clearly, if j £ N(i ), then p^ = 0. From this stochastic matrix one can then construct 
a local signaling entropy ( LS ) as 


LSi = - £ pi j \og p,j 

jeN(i) 


( 2 ) 


which reflects the level of uncertainty or redundancy in the local interaction proba¬ 
bilities. We note that the above expression for the local entropy is not normalised so 
that the maximum possible entropy depends on the degree (hi) of the node. In fact, 


maxLS) = logki 


(3) 


Finally, the signaling entropy rate, SR , is defined in terms of the stationary distribu¬ 
tion (or invariant measure) n of the stochastic matrix (np = 7t), as l24lfl4l 


SR = E XiLSi (4) 

i 

i.e. this global signaling entropy rate is a weighted average of the local entropies 
LSj. We note that although LSi is independent of the expression level of gene i, 
that the gene’s contribution to the entropy rate, i.e. JliLSi , is not. This is because 
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7ii will depend on the gene f s expression level. In this work we refer to the term 
LSRi = TZiLSi as the local entropy rate of gene i, whereas LSi is just the gene V s local 
entropy. 


The maximum entropy rate 


Given a connected network, the maximum entropy rate, maxSR , over the network 
does not depend on the gene expression data but only on the adjacency matrix of the 
network. In fact, the maximum entropy rate is attained for a stochastic matrix pij 
given by hd 


pa = 


A Vi 


(5) 


where v and A are the dominant right eigenvector and eigenvalue of the adjacency 
matrix A, respectively. Thus, it is important to note that the configuration of maximal 
local entropy, i.e. the configuration where for each node i, pij = A[jjk[ and LSi = 
log/:*, is not the configuration of maximal global entropy. 


Perturbation simulation analysis 


In what follows we describe the perturbation analysis performed on Erdos-Renyi 
and scale-free networks, as well as on the full real PPI network described earlier. The 
calculation of the global signaling entropy rate is simplified significantly by the fact 
that the stochastic matrix defined by equation[T]has the detailed balance property, i.e. 
the stationary distribution obeys not only 7 zp = 7Z, but the more restrictive condition 
TZiPij = 7Zjpji . This detailed balance condition can be shown to imply 

Ki = 4-Wy (6) 

F 


where F is a normalisation constant and xj y = Y*jeN(i) x j- 

The initial configuration for the perturbative analysis is that of maximal local en¬ 
tropy for each node in the network, which as explained previously, does not repre¬ 
sent the state of global maximum entropy. To construct this initial configuration we 
set the expression level of each gene/node to be identical Xi = x. Thus, in the ini¬ 
tial configuration, xj y = kiX , and from detailed balance we obtain for the stationary 
distribution that 


1 ki 

7Zi = —XiXj i = - 

F ' Vk 


(7) 


where V is the number of nodes in the network and where k is the average degree. 
As far as the entropy is concerned, the local entropy of each node i is simply log/:*, 
so the initial entropy rate is simply 



Entropy, Cancer and Scale-Freeness 


13 


SRo = ^fVkilogki ( 8 ) 

Vk i 

Now let us consider perturbing a gene in the network by altering its expression level 
by an amount A. Without loss of generality we label the perturbed node by the index 
“1”, so that after perturbation, the expression levels in the network are described by 
x[ =x+8aX. The new stationary distribution then becomes 


7t[ oc (x + k)k\x 

(9) 

7i[ oc x(x + A + (ki~ 1 )v) V/ GiV(l) 

00) 

n'ockix 2 Vi£N( 1)U1 

(11) 

For the local entropies, we get 

LS'j = LSi Vi £ N/N(l) 

(12) 

LS'i = — T p'u \og p'i. 

03) 

j€N(i)/l 

-p' a \ogp' n VieJV(l) 

(14) 


where for i e N(l), p' a = (x+X)/(x +A + (ki — l)x) and p\ - =x/(x+ A + (ki — l)x) 
(/' yM). Thus, the change in the entropy rate, ASR = SR' — SR 0 , is easily computable 
following any perturbation. 

In the actual analysis, when performing activating perturbations, we set x = 2 and 
A = 14, whilst, when modeling inactivating perturbations, we set x = 16 and A = 
— 14. These values are typical for logged Affymetrix or Illumina data, with highly 
expressed genes normally exhibiting values larger than 12, and lowly expressed 
genes showing values smaller than 4. 
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Fig. 1 Increased entropy in liver cancer is driven by increased entropy at hubs: A) Boxplots 
comparing the entropy rate (SR) of 50 normal liver samples (N) to 50 matched liver cancer spec¬ 
imens (C), derived from RNA-Seq data of the TCGA consortium. P-value is from a one-tailed 
Wilcoxon rank sum test, testing the hypothesis that entropy rate is higher in cancer. Also shown 
is the SR between normal and liver cancer for a case where the gene expression profiles were 
randomly permuted (perm) over the interaction network. Observe how the difference in the SR 
between normal and cancer is reduced and even takes an opposite directionality, demonstrating 
that the interplay between gene expression changes and network topology is dictating the higher 
signaling entropy in cancer. B) Boxplots showing the change in the mean local entropy rate (LSR) 
(( 7 iiLSi)c — ( TiiLSi) n) between normal and cancer of each node (gene) as a function of node de¬ 
gree, positive values indicating higher values in cancer. C) Scatterplot of the differential change in 
the mean local entropy rate against the differential change in the mean invariant measure (INVP) 
((fli)c — (^)iv). Each data point is one node (gene). D) Boxplots showing the change in the mean 
local entropy (LS) of each node (gene) (( LSi)c — (LSt )#) between normal and cancer, as a function 
of node degree. 
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Fig. 2 Preferential overexpression of hub genes in cancer: A) Boxplot showing the local en¬ 
tropy rate (LSR) against normal/cancer status, for the hub gene (. BUB1 ) exhibiting the largest in¬ 
crease in the local entropy rate. P- value is from a Wilcoxon rank sum test. B) Scatterplot of gene 
expression values between a representative normal (x-axis) and cancer (y-axis) sample for the gene 
showing the largest increase in the local entropy rate (gene BUB1, marked in red) and that of its 
neighbours in the PPI network (over 800 neighbours, shown in black). C) Boxplot of the average 
difference in gene expression between normal and cancer (positive values indicate higher expres¬ 
sion in cancer) against node-degree class. Observe how the highest-degree hubs show preferen¬ 
tial increased expression in cancer, whereas the largest reductions in expression target low-degree 
nodes. D) Density plot of the average difference in gene expression between normal and cancer 
for two classes of genes: hubs (defined as nodes of degree > 316) and nodes of degree 1 (k=l). 
The number of each is indicated, and the P-value is from a Kolmogorov-Smirnov test, testing for a 
difference in their statistical distributions. 
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Figure 3 


Fig. 3 Effect of cancer perturbations on signaling entropy: A) Examples of two expression 
perturbations typically found in cancer. Top depicts the example of an oncogenic hub undergoing 
overexpression in cancer, which has the effect of drawing in signaling flux from a neighbour i. Ex¬ 
ample at the bottom depicts the underexpression of a low-degree “tumour suppressor” node (e.g. 
a transcripton factor), which from the perspective of node i causes, indirectly, an increased signal¬ 
ing flux through the nearby hub. B) Perturbation analysis of the top 100 genes ranked according 
to fold-change between normal and liver cancer. Plots shows the entropy rate after perturbation 
(y-axis) against node-degree (x-axis), with colors indicating over or underexpression. Black hori¬ 
zontal line defines the entropy rate of the average expression profile of normal liver (i.e. before the 
perturbation). 
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Fig. 4 Cancer perturbations may increase the entropy rate on networks with scale-free topol¬ 
ogy but not on random Poisson graphs: A) A cartoon of the network perturbation analysis: each 
node i of the network is perturbed in turn by changing its expression value. The case of overex¬ 
pression is here indicated in red. The increased expression draws in signaling flux from neighbours 
(only one perturbed edge is shown). The entropy rate of the network after perturbing node i, SRj, 
is computed and compared to the entropy rate SR of the original unperturbed network. For n nodes 
in the network we get a distribution of entropy rate changes ( SRj — SR , i — 1,..., n). B) Zoomed-in 
version of a network perturbation, whereby a node i undergoes a perturbation (here overexpres¬ 
sion). From the perspective of a neighbouring node j, the perturbation causes a low signaling 
entropy configuration around node j. Key question is how does this perturbation affect the global 
entropy rate. C) Perturbation analysis result, in which each node (gene) of the network was per¬ 
turbed through overexpression (red) or underexpression (green). Plotted is the global entropy rate 
(SR) after the perturbation (y-axis) against the degree of the perturbed node (x-axis), for 3 different 
networks: Erdos-Renyi (ER) graph, scale-free (SF) network and the full PPI network (PPI). Black 
dashed line denotes the entropy rate before the perturbation. In each plot there as many data points 
as there are nodes in the network, each value corresponding to the perturbation of only one node. 
Number of nodes (nn), average degree (avK) and median degree (medK) are given. 
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Fig. 5 Entropy rate increase in cancer requires the scale-free topology of the PPI network: A) 

Boxplots of the entropy rate (SR) for the 50 normal liver and 50 liver cancer samples as evaluated 
on the original full PPI network (left), as well as on an equivalent Erdos-Renyi graph (middle). 
P-values are from a Wilcoxon-rank sum test. Corresponding ROC curves and AUC values (right). 
B) As A) but for TCGA RNA-Seq data from 27 colon cancers and 27 matched normals. C) As A) 
but for TCGA RNA-Seq data from 52 prostate cancers and 52 matched normals. 
















































